feat: sliding-window frame parser with per-type resync tracking#47
Conversation
Replace the all-or-nothing frame parser with a sliding-window resync: on a bad header checksum, control bytes, length, end byte, or data checksum the parser discards a single byte and retries, recovering valid frames from stutters, cross-device collisions, and line noise (issue marklynch#46). Framing logic moves into a dedicated, host-testable framing.c module; error accounting is reworked from message-error counters into per-type resync counters surfaced on the status page and tagged through the unknown-message buffer. Tests: golden-file harness seeded with the real captures from the issue, plus chunked-input cases (| splits one case into multiple add_bytes calls) and a chunk-invariance property sweep proving frame recovery is independent of how the byte stream is fragmented across reads.
marklynch
left a comment
There was a problem hiding this comment.
Nice work — the sliding-window resync is a clear improvement, the framing logic reads correctly (loop terminates, bounds are safe, capture/dedup behavior is sound), and pulling it into a host-testable framing.c with golden + chunk-invariance tests is great. A few notes:
1. unknown_msgs_json_handler holds the unknown-buffer mutex across the whole JSON build (medium)
unknown_buffer_lock_for_read() keeps the mutex held while serializing up to UNKNOWN_BUFFER_CAPACITY entries (cJSON allocations, snprintf, get_device_name). Meanwhile unknown_buffer_record() runs on the hot RX/resync path with a 100 ms timeout and drops the record on contention. Loading the Unknown Messages page while the bus is noisy can therefore both lose captures and add latency to the bridge task. Consider snapshotting the entries under the lock, releasing, then building the JSON from the copy.
2. resync_wrapper counter switch has no default (low)
The counter-incrementing switch (type) lacks a default, while the reason-mapping switch just below it has one. If a new tcp_bridge_resync_type_t is added later, resyncs_total still increments but no per-type counter does, so total would no longer equal the sum of the breakdown (and there's no compile-time nudge to catch it).
3. decode_message no longer validates header/length/data checksums (low)
Validation now lives solely in the framing layer, which is correct for RX since framing guarantees validity before dispatch. But decode_message is also called directly on the TX-echo path (tcp_bridge.c:330, user-typed hex) and from the test harness, so malformed input there is now decoded with no checksum warning where it previously was flagged. Low impact — mostly worth noting the new "assumes pre-validated input" invariant for decode_message.
Generated by Claude Code
|
This is great work and should make it a way better. Thanks. Claude review above, the key one I think should address is the first one, as dropping packets could lead to a whole world of pain debugging, particularly if one of the debug pages causes it. |
|
Have just seen #48 also, so if you want to stack any of the suggestions below on that can merge it all then. |
This patch got a little larger than I initially planned. The guts (in framing.c) is reasonably straightforward. I did break it out into its own framing.c file so I could write all the tests against it. The tests are reasonably comprehensive - they test resync/recovery, and as a bonus it's now easier to test reassembly across multiple transport reads, so I added that too.
I upgraded the output in the 'unknown messages' UI to display all the resync types. It might be a bit noisy, but it allows eyeballing to see what the common causes of resync are.
Replace the all-or-nothing frame parser with a sliding-window resync: on a bad header checksum, control bytes, length, end byte, or data checksum the parser discards a single byte and retries, recovering valid frames from stutters, cross-device collisions, and line noise (issue #46).
Framing logic moves into a dedicated, host-testable framing.c module; error accounting is reworked from message-error counters into per-type resync counters surfaced on the status page and tagged through the unknown-message buffer.
Tests: golden-file harness seeded with the real captures from the issue, plus chunked-input cases (| splits one case into multiple add_bytes calls) and a chunk-invariance property sweep proving frame recovery is independent of how the byte stream is fragmented across reads.
Closes #46